16 research outputs found

    Bayesian nonparametric models for peak identification in MALDI-TOF mass spectroscopy

    Full text link
    We present a novel nonparametric Bayesian approach based on L\'{e}vy Adaptive Regression Kernels (LARK) to model spectral data arising from MALDI-TOF (Matrix Assisted Laser Desorption Ionization Time-of-Flight) mass spectrometry. This model-based approach provides identification and quantification of proteins through model parameters that are directly interpretable as the number of proteins, mass and abundance of proteins and peak resolution, while having the ability to adapt to unknown smoothness as in wavelet based methods. Informative prior distributions on resolution are key to distinguishing true peaks from background noise and resolving broad peaks into individual peaks for multiple protein species. Posterior distributions are obtained using a reversible jump Markov chain Monte Carlo algorithm and provide inference about the number of peaks (proteins), their masses and abundance. We show through simulation studies that the procedure has desirable true-positive and false-discovery rates. Finally, we illustrate the method on five example spectra: a blank spectrum, a spectrum with only the matrix of a low-molecular-weight substance used to embed target proteins, a spectrum with known proteins, and a single spectrum and average of ten spectra from an individual lung cancer patient.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS450 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nonparametric Models for Peak Identification and Quantification in MALDI-TOF Mass Spectroscopy

    No full text
    We present a novel nonparametric Bayesian model using Lévy random field priors for identifying the presence and abundance of proteins from mass spectrometry data. Informed prior distributions, based on expert opinion and on preliminary laboratory experiments, help distinguish true peaks from background noise and help resolve un-certainty about peak multiplicity

    TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2

    No full text
    We present a novel nonparametric Bayesian approach based on Lévy Adaptive Regression Kernels (LARK) to model spectral data arising from MALDI-TOF (Matrix Assisted Laser Desorption Ionization Time-of-Flight) mass spectrometry. This model based approach provides identification and quantification of proteins though model parameters that are directly interpretable as the number of proteins, mass and abundance of proteins and peak resolution. Informed prior distributions, based on expert opinion and on preliminary laboratory experiments, help to distinguish true peaks from background noise and help resolve uncertainty about the peak multiplicity. Posterior distributions are obtained using a reversible jump Markov chain Monte Carlo algorithm and provide inference about the number of peaks (proteins), their masses and abundance. We show through simulation studies that the procedure has desirable true-and false-discovery rates. Finally, we illustrate the method on four example spectra: a blank spectrum, a spectrum with only the matrix of a low-molecular-weight substance used to embed target proteins, and a single spectrum and average of ten spectra from an individual lung cancer patient

    Nonparametric models for proteomic peak identification and quantification. Bayesian Inference for Gene Expression and Proteomics

    No full text
    We present model-based inference for proteomic peak identification and quantification from mass spectroscopy data, focusing on nonparametric Bayesian models. Using experimental data generated from MALDI-TOF mass spectroscopy (Matrix Assisted Laser Desorption Ionization Time of Flight) we model observed intensities in spectra with a hierarchical nonparametric model for expected intensity as a function of time-of-flight. We express the unknown intensity function as a sum of kernel functions, a natural choice of basis functions for modelling spectral peaks. We discuss how to place prior distributions on the unknown functions using Lévy random fields and describe posterior inference via a reversible jump Markov chain Monte Carlo algorithm

    Predicting station locations in bike-sharing systems using a proposed quality-of-service measurement: Methodology and case study

    No full text
    Bike-sharing systems (BSSs) operators tend to spend a great amount of time and effort to satisfy users. Accurately measuring the quality-of-service (QoS) of each station in a BSS will advance this mission. Moreover, measuring the QoS and using it to study the spatial dependencies in a BSS allows operators to better manage the system. The traditionally-known QoS measurement reported in the literature is based on the proportion of problematic stations, which are defined as those with no bikes or docks available to users. The authors investigated the traditionally-known QoS measurement, and it was found neither exposes the spatial dependencies between stations nor does it discriminate between stations in a BSS. This study proposes a novel QoS measurement, namely Optimal Occupancy that captures the impact of heterogeneity of bike-sharing systems (BSSs) and reflect the spatial dependencies between the stations. Optimal Occupancy is defined as the ratio of the total time a station is functional during a given interval to the length of the interval, in which it also redefines problematic stations. The authors applied geo-statistics to explore the spatial configuration of Optimal Occupancy variations and model variograms for spatial prediction. Results revealed that the Optimal Occupancy is beneficial for operators, would result in better prediction of the QoS at nearby locations, and can be used to predict candidate spots for new stations in an existing BSS. For example, the proposed QoS for Station 50 was improved after adding a new nearby station, increasing QoS from 0.52 to 0.84 for a Monday and Tuesday of July, respectively.<br/

    Network and station-level bike-sharing system prediction: a San Francisco bay area case study

    No full text
    The paper develops models for modeling the availability of bikes in the San Francisco Bay Area Bike Share System (BSS) applying machine learning at two levels: network and station. Investigating BSSs at the station-level is the full problem that would provide policymakers, planners, and operators with the needed level of details to make important choices and conclusions. We used Random Forest and Least-Squares Boosting as univariate regression algorithms to model the number of available bikes at the station-level. For the multivariate regression, we applied Partial Least-Squares Regression (PLSR) to reduce the needed prediction models and reproduce the spatiotemporal interactions in different stations in the system at the network-level. Although prediction errors were slightly lower in the case of univariate models, we found that the multivariate model results were promising for the network-level prediction, especially in systems where there are a relatively large number of stations that are spatially correlated. Moreover, results of the station-level analysis suggested that demographic information and other environmental variables were significant factors to model bikes in BSSs. We also demonstrated that the available bikes modeled at the station-level at time (Formula presented.) had a notable influence on the bike count models. Station neighbors and prediction horizon times were found to be significant predictors, with 15 minutes being the most effective prediction horizon time.</p
    corecore